Debug Node Registration Issues
This guide helps you debug issues that occur when registering nodes (devices) with a private ESP RainMaker deployment. Node registration is typically performed using the ESP RainMaker Admin CLI, which generates device certificates and bulk-registers nodes via an AWS Batch job.
Before you start, have these ready:
- The
request_idreturned after runningcerts devicecert register - The
node_id(s) of the affected nodes - The admin user ID (email) used to run the registration
- Approximate time when the registration was triggered
Step 1: Identify Your Symptom
| Symptom | Go to |
|---|---|
Admin CLI generate command fails with an error | Admin CLI — Certificate Generation Errors |
Admin CLI register command fails before submitting the job | Admin CLI — Registration Submission Errors |
| Registration job submitted but no confirmation email received | Registration Job Submitted — No Email or Status Unknown |
getcertstatus shows FAILURE or some nodes failed | Registration Job Failed or Partial Failures |
Registration job is stuck in REQUESTED or INPROGRESS for too long | Registration Job Stuck or Timed Out |
| Nodes registered but not visible on the RainMaker Dashboard | Nodes Not Visible on the RainMaker Dashboard |
| Node is registered but the device cannot connect to the cloud | Node Registered but Device Cannot Connect |
| Getting a specific error code (106xxx / 200xxx) | Error Code Reference |
Node Registration Overview
Understanding the flow helps you identify at which stage a failure occurred.
Step 1 Admin CLI generates certificates locally → node_certs.csv
Step 2 CLI calls GET /admin/node_certificates/register → gets S3 pre-signed URL + request_id
Step 3 CLI uploads node_certs.csv to S3
Step 4 CLI calls POST /admin/node_certificates/register → triggers AWS Batch job
Step 5 AWS Batch processes each node: creates IoT Thing, registers certificate,
attaches policy, writes to DynamoDB nodes_v3 table
Step 6 Admin receives email with job summary
Step 7 Device boots, connects to MQTT, publishes config → node visible on dashboard
Admin CLI — Certificate Generation Errors
These errors occur when running python rainmaker_admin_cli.py certs devicecert generate.
Check 1: Verify the command arguments
| Error Message | Cause | Fix |
|---|---|---|
"Maximum of 50,000 nodes generation supported in a single request." | --count exceeds 50,000 | Split into multiple batches with --count ≤ 50000 |
"<count> must be > 0" | Count is zero or negative | Provide a valid --count value |
"'node_id' column not found in file" | --inputfile CSV is missing the node_id column | Ensure the input CSV has a header row with node_id as the column name |
"CA key file is not provided" / "CA cert file is not provided" | Only one of --cacertfile / --cakeyfile was given | Provide both --cacertfile and --cakeyfile together |
"At least one of the following must be provided: --count, ADDITIONAL_VALUES, --inputfile" | No node count source specified | Provide --count, --inputfile, or configure ADDITIONAL_VALUES in config/binary_config.ini |
Check 2: Verify the output directory
After a successful generate, confirm these files exist in the output directory:
<outdir>/<date>/Mfg-<N>/
common/
node_certs.csv ← required for the next `register` step
ca.crt ← CA certificate
node_ids.csv ← list of generated node IDs
endpoint.txt ← MQTT broker hostname
node_details/
node-<idx>-<node_id>/
node.crt ← device certificate
node.key ← device private key
If any of these files are missing, re-run generate. If the output directory is missing entirely, the tool failed before writing any files — check for Python exceptions in the terminal output.
The node_certs.csv in common/ is the input file for the register command. Use the full path when calling register --inputfile.
Admin CLI — Registration Submission Errors
These errors occur when running python rainmaker_admin_cli.py certs devicecert register.
Check 1: Validate the input CSV
| Error Message | Cause | Fix |
|---|---|---|
"Input file is invalid. Please provide file containing the certificates" | CSV has no certs column or all cert values are empty | Use the node_certs.csv generated by the generate step |
"Column count mismatch in row N" | The CSV has inconsistent column counts | Open the CSV in a text editor and fix the row with index N |
"Certificate CN 'X' does not match node_id 'Y'" | The certificate's Common Name does not match the node_id column | Regenerate the certificates — CN mismatch means the cert and node ID are from different batches |
"Invalid CSV file" (error 106026) | CSV format is malformed | Validate the CSV with a CSV linter; check for unescaped quotes or missing commas |
Check 2: Validate tags and policies
| Error Message | Cause | Fix |
|---|---|---|
"Invalid tags specified by user. Check tags format." | Tags are not in key:value format | Use --tags key1:value1,key2:value2 |
"Invalid tags specified by user. Check whether the tags are referencing the proper column names." | A tag references a CSV column that doesn't exist | Ensure the column name in --tags key:@column_name exactly matches a column in the CSV |
"--node_policies option cannot be used together with --update_nodes." | Conflicting flags | Remove --node_policies when using --update_nodes |
"Invalid value for --node_policies" | Unknown policy name | Valid values are mqtt and videostream |
Check 3: Verify connectivity and authentication
| Error Message | Cause | Fix |
|---|---|---|
"Could not connect. Please check your Internet connection." | Admin CLI cannot reach the RainMaker backend | Check your internet connection; verify the server endpoint is correct: account serverconfig |
"HTTP Request timed out." | Request took longer than 30 seconds | Retry. If this persists, check if the backend is reachable |
"Failed to upload Device Certificates" | S3 pre-signed URL upload failed | The pre-signed URL may have expired (1-hour validity). Re-run register to get a fresh URL |
"Request to register device certificate failed" | The POST /admin/node_certificates/register API call failed | Check the exact HTTP error code in the output. Run with verbose logging if available |
"Unable to verify SSL certificate." | TLS verification failed | Verify that rmaker_admin_lib/server_cert/server_cert.pem is the correct certificate for your deployment |
When you successfully submit the registration job, the CLI prints a request_id. Save this value — you need it to check the job status later using getcertstatus --requestid <request_id>.
Registration Job Submitted — No Email or Status Unknown
If the registration job was submitted but you haven't received a confirmation email, or the status is unclear, follow these steps.
Step 1: Check the job status using the CLI
Run:
python rainmaker_admin_cli.py certs devicecert getcertstatus --requestid <request_id>
success→ Job completed. All nodes registered. If nodes are still not visible, see Nodes Not Visible on the RainMaker Dashboard.in_progress→ Job is still running. Wait and check again. Large batches can take up to 10 hours.failure→ Job failed. See Registration Job Failed or Partial Failures.- No output / error → The
request_idmay be invalid, or the entry expired in DynamoDB (entries are kept for a limited time). Verify therequest_idand check DynamoDB directly (Step 2).
Step 2: Check the request record in DynamoDB
Go to AWS Console → DynamoDB → Tables → admin_node_registration_requests.
Query with:
- Partition key (
user_id): the admin user's Cognito user ID - Sort key (
request_id): the request ID from the CLI
What to look for:
| Field | What it tells you |
|---|---|
status | Current job state: REQUESTED, INPROGRESS, SUCCESS, FAILURE |
total_count | Total nodes in the uploaded CSV |
completed_count | Nodes successfully registered so far |
failed_count | Nodes that failed registration |
request_timestamp | When the job was submitted |
If no entry is found with that request_id, the job was never submitted to DynamoDB. The POST /admin/node_certificates/register API call likely failed silently. Re-run the register command.
Step 3: Check the Lambda log for submission errors
Go to CloudWatch → Logs Insights, select /aws/lambda/esp-CertificateRegister, and run:
fields @timestamp, @message
| sort @timestamp asc
| filter @message like "<request_id>"
Look for:
- Successful job submission: message containing
"Submitted batch job"or the job ID - Any error messages indicating why the submission failed
Step 4: Check if the confirmation email was blocked
The confirmation email is sent via AWS SES. If SES is not verified for your deployment, emails may be silently dropped. Run the pre-flight check:
# The CLI checks SES status during register — look for any SES warning in the output
Also check AWS Console → SES → Verified identities to confirm the sender email is verified. If it is not, verify it and re-run the registration job.
Registration Job Failed or Partial Failures
Step 1: Get the overall failure summary
Check the status via CLI or DynamoDB as described above. Note the failed_count and completed_count fields in the admin_node_registration_requests table.
Step 2: Find which specific nodes failed
Go to AWS Console → DynamoDB → Tables → node_manufacturing_errors.
Query with partition key (request_id): the request ID.
This table contains one entry per failed node, with fields:
node_id— which node failederror— the error message from AWS IoT Core or the batch containerrequest_id— links back to the registration job
Step 3: Check the AWS Batch job logs
The bulk registration runs inside an AWS Batch container. The container logs are the most detailed source of per-node errors.
- Go to AWS Console → Batch → Jobs.
- Filter by Job queue:
thing-certificate-registration. - Find your job by checking the submission time (matches
request_timestampin DynamoDB). - Click the job → click Log stream to open the CloudWatch log stream.
The log stream is under the log group /aws/batch/job. Each node's registration attempt is logged here with the outcome.
What to look for in Batch logs:
| Log message | Meaning |
|---|---|
"Thing already exists" | A node with this ID is already registered. Use --force flag to allow re-registration |
"Certificate is already Provisioned" | The same certificate was registered before. Use --force |
"Error in registering certificate" | The certificate PEM is malformed or invalid. Regenerate the certificate for this node |
"Invalid Certificate" | Certificate format error. Check for truncated PEM data in the CSV |
"Error in creating thing" | AWS IoT Core CreateThing failed. Check IAM role permissions for the Batch job |
"Node limit exceeded" | Your deployment's licensed node count is exhausted. Contact Espressif to increase the limit |
Step 4: Re-register failed nodes
After identifying and fixing the root cause:
- Extract the failed
node_idvalues from thenode_manufacturing_errorstable. - Create a new CSV containing only the failed nodes (with their certificates from
node_details/). - Re-run
register --inputfile <new_csv> --forceto register them without failing on any already-registered nodes.
The --force flag tells the server to skip duplicate node errors and continue registering remaining nodes. Use it when re-running a partially failed job.
Registration Job Stuck or Timed Out
The AWS Batch job has a maximum timeout of 10 hours (36000 seconds). For very large batches, the job can run close to this limit.
Step 1: Check the AWS Batch job status
- Go to AWS Console → Batch → Jobs.
- Filter by job queue
thing-certificate-registration. - Find the job matching your
request_id(visible in the job name or environment variables).
| Job Status | Meaning |
|---|---|
SUBMITTED / PENDING | Job is queued, waiting for a compute instance |
RUNNABLE | Job is waiting for compute capacity in the environment |
STARTING / RUNNING | Job is actively processing |
SUCCEEDED | All nodes processed |
FAILED | Container exited with a non-zero code or hit the 10-hour timeout |
If the job is stuck in PENDING or RUNNABLE for more than 10–15 minutes, the compute environment may not have capacity. Check:
- AWS Console → Batch → Compute environments →
ThingCertificateRegister: verify the environment isENABLEDandVALID. - Check if the EC2 Service Limit for the instance type is reached in your region.
Step 2: Check for Batch job timeout
If the job status is FAILED and the batch ran for exactly 10 hours, it hit the timeout. This typically happens with very large batches (tens of thousands of nodes).
Fix:
- Split the CSV into smaller batches and register each separately.
- The recommended batch size is 10,000–20,000 nodes per job.
Step 3: Check CloudWatch for the Batch container logs
Go to CloudWatch → Log groups → /aws/batch/job and find the log stream for the failed job.
Look for:
- The last
completed_countlogged before the job was killed — this tells you how many nodes were registered before the timeout. - Any specific error that caused the container to exit prematurely (e.g., DynamoDB throttling, IoT API rate limits).
Step 4: Check the DynamoDB request record
Check admin_node_registration_requests for the completed_count at the time of failure. Nodes with a lower index than completed_count are registered. Re-register only the remaining nodes using the --force flag.
Nodes Not Visible on the RainMaker Dashboard
Even after a successful bulk registration, nodes may not be visible on the dashboard until the device connects and sends its configuration. There are two distinct cases.
Case A: Node Registered but Never Appeared on Dashboard
Bulk registration creates the IoT Thing and certificate in AWS IoT Core and writes a record to DynamoDB nodes_v3. However, the full device configuration (name, type, firmware version, parameters) is only stored when the device itself publishes its config after first boot.
Step 1: Verify the node exists in DynamoDB
Go to AWS Console → DynamoDB → Tables → nodes_v3.
Query with partition key node_id.
- Entry exists → Node is registered in the system. The dashboard should show it (possibly with limited info until the device publishes config). If it doesn't appear, check admin dashboard permissions.
- No entry found → The bulk registration did not complete for this node. Check
node_manufacturing_errorsfor this node ID and re-register it.
Step 2: Check if the node is in the pending registration table
Go to DynamoDB → Tables → admin_pending_registration_nodes.
Query with:
- Partition key (
user_id): the admin user ID - Sort key (
node_id): the node ID
If the entry exists here but not in the admin dashboard view, the dashboard may need a refresh, or the node is awaiting the device to send its first config.
Case B: Device Booted but Node Config Not Updating
After the device boots and connects to MQTT, it should publish its configuration to the topic node/<node_id>/config. This triggers the esp-RegisterDevice Lambda, which stores the device config in DynamoDB.
Step 1: Verify the device published its config
Go to CloudWatch → Logs Insights, select /aws/lambda/esp-RegisterDevice, and run:
fields @timestamp, @message
| sort @timestamp desc
| filter @message like "<node_id>"
- Entries found with no errors → Config was received and stored. Refresh the dashboard.
- Entries found with errors → Note the error and check the device's config payload format.
- No entries found → The device did not publish its config, or the MQTT rule
esp_node_configis not routing messages to the SQS queue. See Node Registered but Device Cannot Connect.
Step 2: Check the SQS queue for stuck messages
If the device is publishing but the Lambda is not processing:
- Go to AWS Console → SQS →
esp-deviceRegisterSQS. - Check Messages available and Messages in flight.
- If there are messages in the Dead Letter Queue (
esp-FailedMessageDLQ), click Send and receive messages → Poll for messages to inspect them.
Failed messages in the DLQ indicate the esp-RegisterDevice Lambda is failing to process them. Check the Lambda logs for errors.
Step 3: Check the Lambda log for config processing errors
Go to CloudWatch → Logs Insights, select /aws/lambda/esp-RegisterDevice, and run:
fields @timestamp, @message
| filter @message like /error/i or @message like /failed/i
| sort @timestamp desc
| limit 50
Look for JSON parse errors or DynamoDB write failures that could cause the node config to not be stored.
Node Registered but Device Cannot Connect
If the node is registered in DynamoDB and AWS IoT Core, but the physical device cannot establish an MQTT connection:
Step 1: Verify the IoT Thing and certificate exist in AWS IoT Core
- Go to AWS Console → IoT Core → Manage → All devices → Things.
- Search for the node ID.
- Click the thing → go to Certificates tab.
- Confirm a certificate is attached and its status is Active.
If the certificate status is Inactive or Revoked, the device cannot connect.
Fix: Activate the certificate:
- Click the certificate → Actions → Activate.
If no certificate is attached, the bulk registration may have created the Thing but failed to attach the certificate. Check node_manufacturing_errors for this node.
Step 2: Verify the IoT policy is attached
On the same certificate page, go to the Policies tab. Confirm the esp-rainmaker-iot-policy (or equivalent policy for your deployment) is attached.
If no policy is attached, the device will connect to MQTT but all publishes and subscribes will be denied with an AUTH_ERROR.
Fix: Attach the policy:
- Click Actions → Attach policy → select
esp-rainmaker-iot-policy.
Step 3: Verify the device is using the correct certificate and key
The certificate (node.crt) and private key (node.key) must be flashed to the device from the same batch as the one registered with the cloud. If the device firmware uses different certificate files, it will not be able to authenticate.
Check that the NVS binary (bin/node-<idx>-<node_id>.bin) was flashed to the correct device.
Step 4: Check node connection logs
Go to CloudWatch → Logs Insights, select /aws/lambda/esp-ConnectionNode, and filter by <node_id>:
fields @timestamp, @message
| sort @timestamp desc
| filter @message like "<node_id>"
Look for AUTH_ERROR or FORBIDDEN_ACCESS disconnect reasons, which indicate a certificate or policy issue.
See Debugging Node Connection Issues for a full guide on MQTT connection problems.
Step 5: Verify the device is connecting to the correct MQTT endpoint
The device must connect to the MQTT endpoint of your private RainMaker deployment, not the default Espressif endpoint. Confirm the endpoint.txt file generated during the generate step was used when building the firmware's NVS partition.
Run:
cat <outdir>/<date>/Mfg-<N>/common/endpoint.txt
Compare this with the MQTT host your device is configured to use.
Error Code Reference
Bulk Node Creation Errors (106xxx)
| Error Code | Message | Likely Cause and Fix |
|---|---|---|
| 106001 | Node count should be > 0 and ≤ 10000 | Use --count between 1 and 10,000 per request |
| 106004 | Request ID is not valid | The request_id passed to getcertstatus is wrong or expired |
| 106007 | URL requested is expired | The S3 pre-signed URL timed out (1-hour validity). Re-run register |
| 106008 | Error fetching pre-signed URL | Backend error. Retry the registration command |
| 106009 | File name is missing | Provide --inputfile with a valid CSV path |
| 106010 | Error submitting thing registration job | AWS Batch job submission failed. Check if the Batch compute environment is healthy |
| 106011 | File md5 is missing | The CLI could not compute the MD5 of the CSV. Verify the file is readable |
| 106016 | No registration request in progress | No active job for this request_id. The job may have already completed or the ID is wrong |
| 106020 | Total registered nodes exceeds limit | Deployment's licensed node limit reached. Contact Espressif to increase the quota |
| 106026 | Invalid CSV file | The uploaded CSV is malformed. Validate the file format |
| 106031 | CSV must have columns: certs, node_id or CN | Ensure the CSV has node_id and certs columns |
| 106033 | Node ID does not match certificate CN | Certificate was generated for a different node ID. Regenerate certificates |
| 106036 | Invalid node policy | Valid values: mqtt, videostream |
| 106037 | node_policies cannot be used with update_nodes | Remove --node_policies when using --update_nodes |
Self-Claim / Device Registration Errors (200xxx)
| Error Code | Message | Likely Cause and Fix |
|---|---|---|
| 200001 | MAC Address is missing | mac_addr not provided to /claim/node |
| 200009 | Claim does not exist | Node was not pre-claimed or the MAC address lookup failed |
| 200019 | Error in creating thing | AWS IoT Core CreateThing call failed. Check IAM permissions for the claim Lambda |
| 200020 | Certificate is already Provisioned | This certificate is already registered. Use --force to re-register |
| 200021 | Error in registering certificate | Certificate PEM is invalid or the IoT API returned an error |
| 200022 | Invalid Certificate | The certificate data is malformed or expired |
| 200036 | Invalid node policy | Valid policies: mqtt, videostream |
CloudWatch Log Groups Reference
| Log Group | When to Use |
|---|---|
/aws/lambda/esp-CertificateRegister | Check registration job submission, pre-signed URL generation, job trigger errors |
/aws/lambda/esp-NodeIdGeneration | Check node ID generation status when using cloud-based ID generation |
/aws/lambda/esp-RegisterDevice | Check if node config MQTT message was received and stored |
/aws/lambda/esp-RegisterNode | Check HTTPS-based node config registration |
/aws/lambda/esp-createAndRegisterThing | Check self-claim device registration errors |
/aws/lambda/esp-ConnectionNode | Check device MQTT connect/disconnect events |
/aws/batch/job | Check detailed per-node logs from the bulk registration Batch container |
DynamoDB Tables Reference
| Table | When to Check | Key to Query |
|---|---|---|
admin_node_registration_requests | Check bulk job status, progress counts | user_id (partition), request_id (sort) |
node_manufacturing_errors | Find which specific nodes failed in a batch job | request_id (partition), node_id (sort) |
nodes_v3 | Verify a node is registered in the system | node_id (partition) |
admin_pending_registration_nodes | Check nodes registered by admin but not yet claimed by a user | user_id (partition), node_id (sort) |